-
Notifications
You must be signed in to change notification settings - Fork 642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recursive checker implementation. #225
Conversation
I'm not from etcd, but I've incorporated parts of this patch into my private fork. Thanks. |
@ptabor Let's resurrect this PR. Could you please rebase this PR to resolve conflict? |
[scheduling this on my TODO queue] |
8d1bd97
to
bdade3c
Compare
Resurrected. But still need to add a test. |
0d5dd9a
to
6bc4f3a
Compare
f58b8ea
to
6ee6169
Compare
8f17a73
to
6e95e9f
Compare
The PR has tests and resolved conflicts. |
6e95e9f
to
0b0bb5f
Compare
tx_check.go
Outdated
maxKeyInSubtree = tx.recursivelyCheckPagesInternal(elem.pgid, elem.key(), maxKey, pagesStack, keyToString, ch) | ||
runningMin = maxKeyInSubtree | ||
} | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to define maxKey
outside the for loop
return | |
return maxKey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not return maxKeyInSubtree
that is implicit ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we should return maxKeyInSubtree
. For the rightest element in a branch page, the maxKey in the page it links to should be the largest key in the tree rooted from current branch page.
0b0bb5f
to
49e4182
Compare
internal/tests/tx_check_test.go
Outdated
xRay := surgeon.NewXRay(db.Path()) | ||
|
||
path1, err := xRay.FindPathsToKey([]byte("0451")) | ||
require.NoError(t, err, "Cannot find page that contains key:'0451'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: better to lower case the first letter when a message is wrapped in any other utilities (github.com/stretchr/testify
in this case) instead of being outputted directly. This comment applies to the following similar messages as well.
require.NoError(t, err, "Cannot find page that contains key:'0451'") | |
require.NoError(t, err, "cannot find page that contains key:'0451'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
internal/tests/tx_check_test.go
Outdated
|
||
import ( | ||
"fmt" | ||
"github.com/stretchr/testify/require" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reorder this import item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tx_check.go
Outdated
// - Are in right ordering relationship to their parents. | ||
// `pagesStack` is expected to contain IDs of pages from the tree root to `pgid` for the clean debugging message. | ||
func (tx *Tx) recursivelyCheckPagesInternal( | ||
pgid pgid, minKeyClosed, maxKeyOpen []byte, pagesStack []pgid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pgid pgid
... the name is the same string as the type. I know it works, but I'd suggest to avoid it to make it clearer and less confusion.
How about change it to something like pid pgid
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done pgId pgid
--> pid sounds too much like process id.
tx_check.go
Outdated
maxKeyInSubtree = tx.recursivelyCheckPagesInternal(elem.pgid, elem.key(), maxKey, pagesStack, keyToString, ch) | ||
runningMin = maxKeyInSubtree | ||
} | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we should return maxKeyInSubtree
. For the rightest element in a branch page, the maxKey in the page it links to should be the largest key in the tree rooted from current branch page.
tx_check.go
Outdated
for i := range p.leafPageElements() { | ||
elem := p.leafPageElement(uint16(i)) | ||
if i == 0 && runningMin != nil && compareKeys(runningMin, elem.key()) > 0 { | ||
ch <- fmt.Errorf("The first key[%d]=(hex)%s on leaf page(%d) needs to be >= the key in the ancestor (%s). Stack: %v", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ch <- fmt.Errorf("The first key[%d]=(hex)%s on leaf page(%d) needs to be >= the key in the ancestor (%s). Stack: %v", | |
ch <- fmt.Errorf("the first key[%d]=(hex)%s on leaf page(%d) needs to be >= the key in the ancestor (%s). Stack: %v", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
In So the existing logic in I would suggest to implement a common function or method to be shared by both cases. |
Signed-off-by: Piotr Tabor <ptab@google.com>
Recursive checker confirms database consistency with respect to b-tree key order constraints: - keys on pages must be sorted - keys on children pages are between 2 consecutive keys on parent branch page). Signed-off-by: Piotr Tabor <ptab@google.com>
Signed-off-by: Piotr Tabor <ptab@google.com>
Signed-off-by: Piotr Tabor <ptab@google.com>
Signed-off-by: Piotr Tabor <ptab@google.com>
Signed-off-by: Piotr Tabor <ptab@google.com>
Signed-off-by: Piotr Tabor <ptab@google.com>
73d3338
to
eb0deb9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// because of caching. This overhead can be removed if running on a read-only | ||
// transaction, however, it is not safe to execute other writer transactions at | ||
// the same time. | ||
func (tx *Tx) Check(kvStringer KVStringer) <-chan error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change. It changed the signature of method Check. Previously there was no parameter. FYI. etcd-io/etcd@87c0da8
The proposals:
- keep it as it's, and add a breaking change note when we release v1.3.7;
- revert the change, and add a separate method something like
CheckWithStringer(kvString KVString)
, and use a defaultKVStringer
(e.g.bolt.HexKVStringer
) when callingCheck()
; - allow users to pass a
nil
KVStringer, and we use a default one in that case.
"2" seems to be the safest solution, because it has no any impact on users.
Recursive checker confirms database consistency with respect to b-tree
key order constraints:
branch page).
This PR moves checker logic to separate tx_checker file as the logic is already substantial (200).
It does not modify the original logic apart of running recursivelyCheckPages.
The reason to create this stronger checker is data corruption issue in etcd I was investigating that led to following branch state:
...
Such state - according to the previous checker was correct.
The new checker reports:
key (0: 000000000000014d5f0000000000000000) on the branch page(618) needs to be < than key of the next element reachable from the ancestor (000000000000014d5f0000000000000000). Pages stack: [56 618]